Piecewise Holistic Autotuning of Compiler and Runtime Parameters

نویسندگان

Mihail Popov

Chadi Akel

William Jalby

Pablo de Oliveira Castro

چکیده

Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve full potential performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring a long iterative evaluation. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program. Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate CERE tuning of compiler optimizations, number of threads and thread affinity on a NUMA architecture. On average over the NAS 3.0 benchmarks, we achieve a speedup of 1.08× after tuning. Tuning a single codelet is 13× cheaper than whole-program evaluation and estimates the tuning impact on the original region with a 94.6% accuracy. On a Reverse Time Migration (RTM) proto-application we achieve a 1.11× speedup with a 200× cheaper exploration.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Piecewise holistic autotuning of parallel programs with CERE

Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve best performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring long iterative evaluations. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces calle...

متن کامل

Profiling of Code_Saturne with HPCToolkit and TAU, and autotuning Kernels with Orio

This study has profiled the application Code Saturne, which is part of the PRACE benchmark suite. The profiling has been carried out with the tools HPCtookit and Tuning and Analysis Utilities (TAU) with the target of finding compute kernels suitable for autotuning. Autotuning is regarded as a necessary step in achieving sustainable performance at an Exascale level as Exascale systems most likel...

متن کامل

Performance Counter Monitoring for the Blue Gene / Q Architecture

This quarter’s newsletter features the performance area, which has a broad focus including autotuning, performance modeling, end-to-end performance measurement, and tool integration. We aim to extend and integrate mature, robust performance measurement technologies and develop a comprehensive performance data management framework that can be used by other areas of the SUPER project. End-to-end ...

متن کامل

Application-independent Autotuning for GPUs

Autotuning is an established technique for adjusting performance-critical parameters of applications to their specific run-time environment. In this paper, we investigate the potential of online autotuning for general purpose computation on GPUs. Our application-independent autotuner AtuneRT optimizes GPU-specific parameters such as block size and loop-unrolling degree. We also discuss the pecu...

متن کامل

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

This paper presents a compiler toolkit that addresses two important emerging challenges: (1) effectively compiling dynamic array-based languages such as MATLAB, Python and R; and (2) effectively utilizing a wide range of rapidly evolving hybrid CPU/GPU architectures. The toolkit provides: a high-level IR specifically designed to express a wide range of arraybased computations and indexing modes...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Piecewise Holistic Autotuning of Compiler and Runtime Parameters

نویسندگان

چکیده

منابع مشابه

Piecewise holistic autotuning of parallel programs with CERE

Profiling of Code_Saturne with HPCToolkit and TAU, and autotuning Kernels with Orio

Performance Counter Monitoring for the Blue Gene / Q Architecture

Application-independent Autotuning for GPUs

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

عنوان ژورنال:

اشتراک گذاری